pytorch  网络参数 weight bias 初始化详解

您所在的位置:网站首页 classname m__class____name__ pytorch  网络参数 weight bias 初始化详解

pytorch  网络参数 weight bias 初始化详解

#pytorch  网络参数 weight bias 初始化详解| 来源: 网络整理| 查看: 265

权重初始化对于训练神经网络至关重要,好的初始化权重可以有效的避免梯度消失等问题的发生。

在pytorch的使用过程中有几种权重初始化的方法供大家参考。

注意:第一种方法不推荐。尽量使用后两种方法。

# not recommend def weights_init(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: m.weight.data.normal_(0.0, 0.02) elif classname.find('BatchNorm') != -1: m.weight.data.normal_(1.0, 0.02) m.bias.data.fill_(0)# recommend def initialize_weights(m): if isinstance(m, nn.Conv2d): m.weight.data.normal_(0, 0.02) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.weight.data.normal_(0, 0.02) m.bias.data.zero_()# recommend def weights_init(m): if isinstance(m, nn.Conv2d): nn.init.xavier_normal_(m.weight.data) nn.init.xavier_normal_(m.bias.data) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias, 0) elif isinstance(m, nn.BatchNorm1d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias, 0)

编写好weights_init函数后,可以使用模型的apply方法对模型进行权重初始化。

net = Residual() # generate an instance network from the Net class net.apply(weights_init) # apply weight init

补充知识:Pytorch权值初始化及参数分组

1. 模型参数初始化

# ————————————————— 利用model.apply(weights_init)实现初始化 def weights_init(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) if m.bias is not None: m.bias.data.zero_() elif classname.find('BatchNorm') != -1: m.weight.data.fill_(1) m.bias.data.zero_() elif classname.find('Linear') != -1: n = m.weight.size(1) m.weight.data.normal_(0, 0.01) m.bias.data = torch.ones(m.bias.data.size()) # ————————————————— 直接放在__init__构造函数中实现初始化 for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) if m.bias is not None: m.bias.data.zero_() elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm1d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): nn.init.xavier_uniform_(m.weight.data) if m.bias is not None: m.bias.data.zero_() # ————————————————— self.weight = Parameter(torch.Tensor(out_features, in_features)) self.bias = Parameter(torch.FloatTensor(out_features)) nn.init.xavier_uniform_(self.weight) nn.init.zero_(self.bias) nn.init.constant_(m, initm) # nn.init.kaiming_uniform_() # self.weight.data.normal_(std=0.001)

2. 模型参数分组weight_decay

def separate_bn_prelu_params(model, ignored_params=[]): bn_prelu_params = [] for m in model.modules(): if isinstance(m, nn.BatchNorm2d): ignored_params += list(map(id, m.parameters())) bn_prelu_params += m.parameters() if isinstance(m, nn.BatchNorm1d): ignored_params += list(map(id, m.parameters())) bn_prelu_params += m.parameters() elif isinstance(m, nn.PReLU): ignored_params += list(map(id, m.parameters())) bn_prelu_params += m.parameters() base_params = list(filter(lambda p: id(p) not in ignored_params, model.parameters())) return base_params, bn_prelu_params, ignored_params OPTIMIZER = optim.SGD([ {'params': base_params, 'weight_decay': WEIGHT_DECAY}, {'params': fc_head_param, 'weight_decay': WEIGHT_DECAY * 10}, {'params': bn_prelu_params, 'weight_decay': 0.0} ], lr=LR, momentum=MOMENTUM ) # , nesterov=True

Note 1:PReLU(x) = max(0,x) + a * min(0,x). Here a is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter a across all input channels. If called with nn.PReLU(nChannels), a separate a is used for each input channel. Note 2: weight decay should not be used when learning a for good performance. Note 3: The default number of a to learn is 1, the default initial value of a is 0.25.

3. 参数分组weight_decay–其他

第2节中的内容可以满足一般的参数分组需求,此部分可以满足更个性化的分组需求。参考:face_evoLVe_Pytorch-master

自定义schedule

def schedule_lr(optimizer): for params in optimizer.param_groups: params['lr'] /= 10. print(optimizer)

方法一:利用model.modules()和obj.__class__ (更普适)

# model.modules()和model.children()的区别:model.modules()会迭代地遍历模型的所有子层,而model.children()只会遍历模型下的一层 # 下面的关键词if 'model',源于模型定义文件。如model_resnet.py中自定义的所有nn.Module子类,都会前缀'model_resnet',所以可通过这种方式一次性筛选出自定义的模块 def separate_irse_bn_paras(model): paras_only_bn = [] paras_no_bn = [] for layer in model.modules(): if 'model' in str(layer.__class__): # eg. a=[1,2] type(a):


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3